Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available February 13, 2026
-
Colijn and Plazzotta (2018) [1] described a bijective scheme for associating the unlabeled bifurcating rooted trees with the positive integers. In mathematical and biological applications of unlabeled rooted trees, however, nodes of rooted trees are sometimes multifurcating rather than bifurcating. Building on the bijection between the unlabeled bifurcating rooted trees and the positive integers, we describe bijective schemes for associating the unlabeled multifurcating rooted trees with the positive integers. We devise bijections with the positive integers for a set of trees in which each non-leaf node has exactly k child nodes, and for a set of trees in which each non-leaf node has at most k child nodes. The calculations make use of Macaulay's binomial expansion formula. The generalization to multifurcating trees can assist with the use of unlabeled trees for applications in evolutionary biology, such as the measurement of phylogenetic patterns of genetic lineages in pathogens.more » « less
-
Abstract Rooted binarygalledtrees generalize rooted binary trees to allow a restricted class of cycles, known asgalls. We build upon the Wedderburn-Etherington enumeration of rooted binary unlabeled trees withnleaves to enumerate rooted binary unlabeled galled trees withnleaves, also enumerating rooted binary unlabeled galled trees withnleaves andggalls,$$0 \leqslant g \leqslant \lfloor \frac{n-1}{2} \rfloor $$ . The enumerations rely on a recursive decomposition that considers subtrees descended from the nodes of a gall, adopting a restriction on galls that amounts to considering only the rooted binarynormalunlabeled galled trees in our enumeration. We write an implicit expression for the generating function encoding the numbers of trees for alln. We show that the number of rooted binary unlabeled galled trees grows with$$0.0779(4.8230^n)n^{-\frac{3}{2}}$$ , exceeding the growth$$0.3188(2.4833^n)n^{-\frac{3}{2}}$$ of the number of rooted binary unlabeled trees without galls. However, the growth of the number of galled trees with only one gall has the same exponential order 2.4833 as the number with no galls, exceeding it only in the subexponential term,$$0.3910n^{\frac{1}{2}}$$ compared to$$0.3188n^{-\frac{3}{2}}$$ . For a fixed number of leavesn, the number of gallsgthat produces the largest number of rooted binary unlabeled galled trees lies intermediate between the minimum of$$g=0$$ and the maximum of$$g=\lfloor \frac{n-1}{2} \rfloor $$ . We discuss implications in mathematical phylogenetics.more » « less
-
Abstract Objective In mathematical phylogenetics, a labeled rooted binary tree topology can possess any of a number of labeled histories, each of which represents a possible temporal ordering of its coalescences. Labeled histories appear frequently in calculations that describe the combinatorics of phylogenetic trees. Here, we generalize the concept of labeled histories from rooted phylogenetic trees to rooted phylogenetic networks, specifically for the class of rooted phylogenetic networks known as rooted galled trees . Results Extending a recursive algorithm for enumerating the labeled histories of a labeled tree topology, we present a method to enumerate the labeled histories associated with a labeled rooted galled tree. The method relies on a recursive decomposition by which each gall in a galled tree possesses three or more descendant subtrees. We exhaustively provide the numbers of labeled histories for all small galled trees, finding that each gall reduces the number of labeled histories relative to a specified galled tree that does not contain it. Conclusion The results expand the set of structures for which labeled histories can be enumerated, extending a well-known calculation for phylogenetic trees to a class of phylogenetic networks.more » « less
-
Mailler, Cécile; Wild, Sebastian (Ed.)Galled trees appear in problems concerning admixture, horizontal gene transfer, hybridization, and recombination. Building on a recursive enumerative construction, we study the asymptotic behavior of the number of rooted binary unlabeled (normal) galled trees as the number of leaves n increases, maintaining a fixed number of galls g. We find that the exponential growth with n of the number of rooted binary unlabeled normal galled trees with g galls has the same value irrespective of the value of g ≥ 0. The subexponential growth, however, depends on g; it follows c_g n^{2g-3/2}, where c_g is a constant dependent on g. Although for each g, the exponential growth is approximately 2.4833ⁿ, summing across all g, the exponential growth is instead approximated by the much larger 4.8230ⁿ.more » « less
-
The study of cultural evolution benefits from detailed analysis of cultural transmission in specific human domains. Chess provides a platform for understanding the transmission of knowledge due to its active community of players, precise behaviours and long-term records of high-quality data. In this paper, we perform an analysis of chess in the context of cultural evolution, describing multiple cultural factors that affect move choice. We then build a population-level statistical model of move choice in chess, based on the Dirichlet-multinomial likelihood, to analyse cultural transmission over decades of recorded games played by leading players. For moves made in specific positions, we evaluate the relative effects of frequency-dependent bias, success bias and prestige bias on the dynamics of move frequencies. We observe that negative frequency-dependent bias plays a role in the dynamics of certain moves, and that other moves are compatible with transmission under prestige bias or success bias. These apparent biases may reflect recent changes, namely the introduction of computer chess engines and online tournament broadcasts. Our analysis of chess provides insights into broader questions concerning how social learning biases affect cultural evolution.more » « less
-
How many ways are there to arrange the sequence of games in a single-elimination sports tournament? We consider the connection between this enumeration problem and the enumeration of “labeled histories,” or sequences of asynchronous branching events, in mathematical phylogenetics. The possibility of playing multiple games simultaneously in different arenas suggests an extension of the enumeration of labeled histories to scenarios in which multiple branching events occur simultaneously. We provide a recursive result enumerating game sequences and labeled histories in which simultaneity is allowed. For a March Madness basketball tournament of 68 labeled teams, the number of possible sequences of games is ∼1.91×10^78 if arbitrarily many arenas are available, but only ∼3.60×10^68 if all games must be played sequentially in the same arena.more » « less
-
Mixed-membership unsupervised clustering is widely used to extract informative patterns from data in many application areas. For a shared dataset, the stochasticity and unsupervised nature of clustering algorithms can cause difficulties in comparing clustering results produced by different algorithms, or even multiple runs of the same algorithm, as outcomes can differ owing to permutation of the cluster labels or genuine differences in clustering results. Here, with a focus on inference of individual genetic ancestry in population-genetic studies, we study the cost of misalignment of mixed-membership unsupervised clustering replicates under a theoretical model of cluster memberships. Using Dirichlet distributions to model membership coefficient vectors, we provide theoretical results quantifying the alignment cost as a function of the Dirichlet parameters and the Hamming permutation difference between replicates. For fixed Dirichlet parameters, the alignment cost is seen to increase with the Hamming distance between permutations. Datasets with low variance across individuals of membership coefficients for specific clusters generally produce high misalignment costs—so that a single optimal permutation has far lower cost than suboptimal permutations. Higher variability in data, as represented by greater variance of membership coefficients, generally results in alignment costs that are similar between the optimal permutation and suboptimal permutations. We demonstrate the application of the theoretical results to data simulated under the Dirichlet model, as well as to membership estimates from inference of human-genetic ancestry. The results can contribute to improving cluster alignment algorithms that seek to find optimal permutations of replicates. Supplementary materials for this article are available online.more » « less
-
Abstract Allele-sharing statistics for a genetic locus measure the dissimilarity between two populations as a mean of the dissimilarity between random pairs of individuals, one from each population. Owing to within-population variation in genotype, allele-sharing dissimilarities can have the property that they have a nonzero value when computed between a population and itself. We consider the mathematical properties of allele-sharing dissimilarities in a pair of populations, treating the allele frequencies in the two populations parametrically. Examining two formulations of allele-sharing dissimilarity, we obtain the distributions of within-population and between-population dissimilarities for pairs of individuals. We then mathematically explore the scenarios in which, for certain allele-frequency distributions, the within-population dissimilarity – the mean dissimilarity between randomly chosen members of a population – can exceed the dissimilarity between two populations. Such scenarios assist in explaining observations in population-genetic data that members of a population can be empirically more genetically dissimilar from each other on average than they are from members of another population. For a population pair, however, the mathematical analysis finds that at least one of the two populations always possesses smaller within-population dissimilarity than the value of the between-population dissimilarity. We illustrate the mathematical results with an application to human population-genetic data.more » « less
-
Interpretations of values of the F ST measure of genetic differentiation rely on an understanding of its mathematical constraints. Previously, it has been shown that F ST values computed from a biallelic locus in a set of multiple populations and F ST values computed from a multiallelic locus in a pair of populations are mathematically constrained as a function of the frequency of the allele that is most frequent across populations. We generalize from these cases to report here the mathematical constraint on F ST given the frequency M of the most frequent allele at a multiallelic locus in a set of multiple populations. Using coalescent simulations of an island model of migration with an infinitely-many-alleles mutation model, we argue that the joint distribution of F ST and M helps in disentangling the separate influences of mutation and migration on F ST . Finally, we show that our results explain a puzzling pattern of microsatellite differentiation: the lower F ST in an interspecific comparison between humans and chimpanzees than in the comparison of chimpanzee populations. We discuss the implications of our results for the use of F ST . This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’.more » « less
An official website of the United States government
